## Review Problem 29

- The pipelined CPU has the stage delays shown
- \* Is it better to speed up the ALU by 10ns, or the Data Memory by 2ns?
- Does you answer change for a single-cycle CPU? Yes. This speeds up lossest Path.



# Solution #3: Branch Delay Slot

Redefine branches: Instruction directly after branch always executed Instruction after branch is the delay slot

ADOXI, XO, XI No Vade CBZ X2, FOO ADD X1, X0, X4 Compiler/assembler fills the delay slot 一本な ~)548 x2, x0, x3 ADD, X1, X0, X4 CBZ X1, FOO SUB X2, X0, useful siver 682 Thris odds
Assure 50% tehen seerl pich the one mest FOO: ADD X1, X2, X0 ADD X1, X3, X3 CBZ X1, FOO ADD X1, X0, X4 Ex#3 hoste I cycle ADD X31, X31, X3) ADD X1, X0, X4 CBZ X1, FOO しなべり

#### Data Hazards



## Design Register File Carefully



What if reads see value after write during the same cycle? The rest.le

ADD <u>X0</u>, X1, X2 SUB X3, <u>X0</u>, X4

AND X5, <u>X0</u>, X6

ORR X7, <u>X0</u>, X8

EOR X9, X0, X10



#### Forwarding

Add logic to pass last two values from ALU output to ALU input(s) aş needed

Forward the ALU output to later instructions; The 155 ADD X0, X1, X2 SUB X3, X0, X4 Dassing

EOR X9, X0, X10

ADD <u>X0</u>, X1, X2 SUB X3, <u>X0</u>, X4 AND X5, <u>X0</u>, X6 ORR X7, <u>X0</u>, X8



## Note: Use THIS one rot the book one.

#### Forwarding (cont.)

Remember destination register for operation. Requires values from last two ALU operations.

Compare sources of current instruction to destinations of previous 2.



### Data Hazards on Loads



# Data Hazards on Loads (cont.)

#### Solution

Force compiler to not allow register reads within a cycle of load Use same forwarding hardware & register file for hazards 2+ cycles later Fill delay slot, or insert no-op.

Brich dely stati instristy ofthe a burn Load delay slot instricted after a load Reviole: Delay slot nears Programe rust des "Cha is busted, carnet read the local's ting register ats executes.